Specification-Based Data Reduction in Dimensional Data Warehouses
نویسندگان
چکیده
Many data warehouses contain massive amounts of data and grow rapidly. Examples include warehouses with retail sales data capturing customer behavior and warehouses with click-stream data capturing user behavior on web sites. The sheer size of these warehouses makes them increasingly hard to manage and query efficiently. As time passes, old, detailed data in the warehouses tend to become less interesting. However, higher-level summaries of the data, taking up far less space, continue to be of interest. Thus, it is possible to perform data reduction in dimensional data warehouses by aggregating data to higher levels in the dimensions. This paper presents an effective technique for data reduction that handles the gradual change of the data from new, detailed data to older, summarized data. The technique enables huge storage gains while ensuring the retention of essential data. The data reduction is based on formal specifications of when data should be aggregated to higher levels. Care is taken to ensure that the irreversible data reductions are without semantic problems. It is defined how queries over the resulting data with varying levels of detail are handled, and a strategy for implementing the technique using standard data warehouse technology is described.
منابع مشابه
Efficient Storage and Management of Environmental Information
Spatial Data warehouses pose many challenging requirements with respect to the design of the data model due to the nature of analytical operations and the nature of the views to be maintained by the spatial warehouse. The first challenge is due to the multi-dimensional nature of each dimension itself. In a traditional data warehouse the various dimensions contributing to the warehouse data are ...
متن کاملHigh-dimensional Hierarchical Olap : a Prefix– Index Hierarchical Cubing Approach
The pre-computation of data cubes is critical for improving the response time of OLAP(online analytical processing) systems and accelerating data mining tasks in large data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional OLAP, it might not be practical to build all th...
متن کاملASM Ground Model and Refinement for Data Warehouses
Data Warehouses and on-line analytical processing (OLAP) systems are a promising area for the application of Abstract State Machines (ASMs). In this paper a ground model specification for data warehouses is sketched that is based on the fundamental idea of separating input from operational databases and output to OLAP systems. On this basis we start defining formal refinement rules for such sys...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملCorrugated Box Production Process Optimization Using Dimensional Analysis and Response Surface Methodology
Response surface methodology (RSM) is a statistical method useful in the modeling and analysis of problems in which the response variable receives the influence of several independent variables, in order to determine which are the conditions under which should operate these variables to optimize a corrugated box production process. The purpose of this research is to create response surface mode...
متن کامل